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Abstract: We prove a martingale triangular array generalization of the Chow-Birnbaum- 
Marshall's inequality. The result is used to derive a strong law of large numbers for martingale 
triangular arrays whose rows are asymptotically stable in a certain sense. To illustrate, we 
derive a simple proof, based on martingale arguments, of the consistency of kernel regression 
with dependent data. Another application can be found in [1] where the new inequality is 
used to prove a strong law of large numbers for adaptive Markov Chain Monte Carlo methods. 
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1. Strong law of large numbers for martingale arrays 

Let {il.,J-,¥) be a probability space and E the expectation operator with respect to P. Let 
{Dn,i, ^n,i, 1 < "i < n}, n > 1 be a martingale-difference array. That is for each n > 1, {J-'n,i, 1 < 
i < n} is a non-decreasing sequence of sub-sigma-algebra of for any 1 < f < n, E (|Z)„^i|) < oo 
and ¥, {Dn,i\J'n,i-i) = 0. We assume throughout the paper that J-'n.o = for all n > 0. We 

introduce the partial sums 

k 

Mn,k ■■= l<k<n, n>l. 

i=l 

For each n > 1, {{Mn^k,^n,k)^ 1 < < n} is a martingale. Let {c„, n > 1} be a non-increasing 
sequence of positive numbers. We are interested in conditions under which CnMn,n converges 
almost surely to zero. 

Martingales and martingale arrays play an important role in Probability and Statistics as 
valuable tools for limit theory. Much is known on the limit theory of martingales (see e.g. [4]) but 
comparatively little work has been done on the law of large numbers for martingale arrays. 
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One of the most effective approach to proving the strong law of large numbers for martingales is 
via the Kolmogorov's inequality for martingales obtained by Chow ([•)]) and Birnbaum-Marshall 
([2]). 

Theorem 1.1 (Chow-Birnbaum-Marshall's inequality). Let {{Sk, J'k) , k > 1} be a sub-martingale 
and {ck,k > 1} a non-increasing real-valued sequence. For p > 1 and n < N 

p ( sup c„is„i > < c^E[|5,vn + E - n\Smn 

\n<m<N J 

The following theorem gives an extension to martingale arrays. We introduce the sequence 



n. 



k n k 

Sn,k = Mn^k = E ^n.«' < k < n and Sn,k = E ^"'^ + E -^i.i' ^ > 

i=l i=l j=n+l 

n-1 

Rn '■= E (-^".i ~ ■ 

i=i 

Theorem 1.2. Let {Dn,i, J'^n,i, 1 < ^ < n}, n > 1 be a martingale- difference array and {c„, n > 
1} a non-increasing sequence of positive numbers. Assume that J-n,i = J'i for all i,n. For n < 
m < N , p > 1 and A > 

max Cm\M^,m\ > A < 4,E{\Sn.N\n + E (c^ " c^+i) ^ 

^ n<m<N ' ' ^ -' J'I 



E 



N 



E ^il^il 



■ (1) 



Proof. For n <m < N, we have Mm,m = -^m-i,m-i + L)m,m + -Rm leading to the decomposition 



-^m,m — -^n,n + E ^^iJ E ~ ^n,m + E 

j=n+l j=n+l j=n+l 

We note that {{Sn^m^^m)-, 1 < m. < N} is a martingale. We also introduce 

m 

y — fl>\q, IP _|_ 'S^ {\^ -IP — IS* • 1 1P^I 
— "-n I I / , "-J Vl'-^'ijl I'-^'iJ — 1| / ■ 

j=n+l 

It is easy to check that has the alternative form 

m— 1 



.IP 



imsart ver. 2005/10/19 file: martarray.tex date: May 17, 2009 



/SLLN for martingales arrays 3 

Since {(|«S'„^m|'^) -^m.)) 1 < m < N} is a sub-martingale and {c^, > 1} is non-increasing, 
we have E(Zm|^m-i) ^ Z^-i, that is {{Z^, ^m), n < m < N} is a sub-martingale. For 
n < m < N, we introduce the sets Am"* '■= {cm\Sn,m\ > ^f^}, ^rn '■= {cni\J2^=n+i > ^f^} 
and B„i '■= {cjl^^jjl < A, j = n, . . . , m — 1, Cm\Mm,m\ > -^}- We have: 



APP max Cm I M^ii m I ^ 

\n<m<N 



(N \ / ^ \ / ^ ^ 

E A^li^.. < IE E + E E An,^^^,, 

m=n / \m=n / \rn=n / 



N 



N 



< E ^ 2P\c^Mm,mn 



m 



=n j=n+l 









f E ^ii^.il 




_Vj=n+l / 



< E 



N 









( E c^i^.il 




_\j=n+l / 



< 2PE 



N 



Zn+\ E cjI^jI 

■,-=n+l 



□ 



In many situations, one deals with martingale arrays whose rows are asymptotically stable in 
the sense that the sequence E [\Rn\] converges to zero as n increases to infinity. Theorem 1.2 can 
be used to prove a strong law of large numbers for such martingale arrays. 

Corollary 1.1. Let {Dn,i,^n,i, 1 < * < n}, n > 1 be a martingale-difference array and {c„, n > 
1} a non-increasing sequence of positive numbers. Assume that Tn,i = for all i, n. Suppose that 
there exists p>l such that for any rig > 1 

hm ( <E WSn^M + T.(4- IE [\Sr.,kn] = 0, and J2 ^n^'^'' [l^^n < ^- (2) 



k=n 



n>l 



Then CnMn^n converges almost surely to zero. 

Remark 1.1. With respect to the process {Rn} in Theorem 1.2, we point out that, because of 
the assumption = Tj^ the sequence {Z^j=i {Dnj — Dn-i,j) j-^fe, l<A;<n — l}isa also 
martingale. 

The conditions in Corollary 1.1 are expressed in terms of moments of martingales. These 
moments can be nicely bounded by moments of the martingale differences. We give one such 
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bound in the next proposition. It is a consequence of the Burkholder's inequahty ([4], Theorem 
2.10) and some classical convexity inequalities. We omit the details. 

Proposition 1.1. Let {Dn,i, ^n,i, 1 < ^ < n}, n > 1 be a martingale-difference array. For any 
p>l, 

k 

i=i 

where C = [iSpq^/'^Y , p^^ + q'^ = 1. 

2. Kernel regression with Markov chains 

As an application, we prove the strong consistency of the Nadaraya- Watson estimator for non- 
parametric regression where the data arises from a non-stationary Markov chain. The approach of 
the proof can be adapted to study other kernel methods or other statistical smoothing procedures 
with dependent data. We assume the following structure for the data. {(Xj, ej), i > 0} is a joint 
M^-valued Markov chain on some probability space (fi, J^, P) such that 

lP((^„,en) G^xS|(Xfc,efc), A;<n-1) = / p{Xn-u z)q{z, B)dz, 

J A 

for transition probability densities p and g. p is the transition probability density of the marginal 
Markov chain {Xj, z > 0} and q{x^A) = p{en G ^l-'^n = x) is the transition probability density 
of the error term e„. All densities are with respect to the Lebesgue measure denoted dx. We 
assume that p has an invariant distribution vr (that is t:{x) = J^7r{y)p{y, x)dy, x G M) and 

/ 7r(x) ( / eq{x, e)(ie ) dx = 0. (4) 
JR \Jr J 

We consider the dependent variable 

Yi = r{Xi) + ei, i>0. 

We are interested in estimating the regression function r. Note that the error terms are corre- 
lated and we do not assume that E (e^) = unless, as assumed in (4), the Markov chain {Xi, i > 0} 
is in stationarity. For the reader's convenience, we will sometimes use the notation E(C/ = x) 
to denote the integral U (r(x) -1- e) q{x, e)de, whenever such integral is well-defined. A popular 
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nonparametric estimator for r is the Nadaraya- Watson estimator 

V" YK ( ^o~^i \ 

where K is the kernel (a nonnegative function such that K[x)dx = 1) and /in > the band- 
width. Let : M — > R be a measurable function. We study the almost sure convergence of 

-,*(--o) = ;i;|:v-«)A'(^). 

as n ^ oo. We can then deduce the convergence of the Nadaraya- Watson estimator by setting 
tp{x) = X for the numerator and 'ip{x) = 1 for the denominator. 

Let /i be the distribution of Xq, the initial distribution of the Markov chain. We write P for 
the Markov kernel induced by p which operates on nonnegative bounded measurable functions 
as Pf{x) = J^p{x,y)f{y)dy. The iterates operators of P are defined as P^f{x) = fix) and for 
n > 1, P^f{x) = P(P^~^ f){x). We will assume that P is geometrically ergodic. That is 

Bl P is (^-irreducible, aperiodic and there exist a function V : M — > [1, oo), A € (0, 1), 5 G (0, oo) 
such that 

PV{x) < XV{x) + blc{x), 

for some small set C. 

This assumption is a well known stability assumption for Markov kernels extensively studied 
in [5]. One important consequence of (Bl) that we will use is the following. For any a € (0, 1], 
there exists C{a) < oo such that for all n > 0, 



sup 

I/Iv--<1 



P'^fix) - / fix)7rix)dx 



< C(a)p"T/"(x), xeR (6) 



where := sup^^jj y^j^- A proof can be found [5], Chapter 15. 

We assume that niV) := J^V{x)fi{x)dx < oo. By iterating the drift condition (Bl), it is easy 
to see that 

supE[V{Xn)]<n{V)+b/{l-X)<oo. (7) 

n>0 

On the function ?/^, we assume that 

supy-^/2(x)(l + |x|)E[|V'(y)||X = x] <oo, and supV~\x)e\tP'^{Y)\X = x] < oo. (8) 

imsart ver. 2005/10/19 file: martarray.tex date: May 17, 2009 



/SLLN for martingales arrays 6 

On the kernel if, we assume that 

\K{x) — K{x')\ 

suY)K{x) < oo, Hm = 0, and sup j < oo. (9) 

On the sequence n >0}, we assume that: 

/i„~n-^, with /3e (0,1/4). (10) 

Theorem 2.1. Assume (Bl), (8-10) and that the function x — > tt{x)¥.{iIj{Y)\X = x) is continu- 
ous at xq. Then hm„_^oo 'rn,^ixo) = 7r(xo)IE {'tlj{Y)\X = xq) with ¥ -probability one. 

Proof. Throughout the proof, xq € M is fixed and C wiU denote a finite constant whose actual 
value might diff'er from one appearance to the next. Define J-n ■= a{{Xk, Y^), k < n}. For h > 0, 
define F;,(x,y) = ^Piy)K (^), A(x) = K (^) E WY)\X = x), and 

gh{x)=Y.Pfh{x), 

where P^f{x) := P^f{x) — J^f{x)Tr{x)dx. By (8), the boundedness of K and the geometric 
ergodicity assumption (6), gh is well-defined and satisfies 1(7/11^1/2 < C(l — p)^^. It is also well- 
known that gh solves the Poisson equation for f^ and P. In other words, we have 

gh{x) - Pghix) = fh{x), (11) 

where fh{x) = fh{x) - fh{x)7r{x)dx. 

Similarly, define Hh{x,y) = Fh{x,y) + Pgh{x). It is left to the reader to check that 
¥.[Hh{Xn,Yn)\Xn-i = X, K„_i = y] = Pfh{x)+P^gh{x) = Pgh{x) + f^fh{x)T^{x)dx (using (11)). 
It follows that 

Fh{x,y) - j 7r{x) fh{x)dx = Hh{x,y) -E[Hh{Xn,Yn)\Xn^i = x,y„_i = y] , x,y e R. (12) 
Using (12), we can decompose r„_^(xo) as 

rn,^{xo) = ^ [ K (^^f^) E WY)\X = x] 7r{x)dx + ^ E Dn,k 

+ {nhn)-^ (E [Hh^{Xi,Yi)\To] - E [Hh^{Xn+l,Yn+l)\Tn]) , 

where D^^k = Hh^Xk^Yu) - E [HhSXk,Yk)\Tk-i]. 
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Under the stated assumptions, it is a standard result of kernel estimation that 



lim 



1 



K i ) E [ipiY)\X = x] TT{x)dx = E {il^{Y)\X = xq) 7r(xo). 



See e.g. [6] for a proof. 

We deduce from the drift condition (Bl) and (8) that 

SUvMHh{Xn,Yn)\\Tn-l] < CV^'^{Xn-l), 
n>l 

for some finite constant C that does not depend on h. Combined with (7) we get for any > 0, 
^P((n/i„)-i \&[HhAXi,Yi)\T^]-¥.[HhSXn+i,Yn+i)\Tn]\ >5)< C6~^ ^'^''^'^^ < 

fe>l n>l 

This easily implies that the term {nhn)~^ {E[Hh„{Xi,Yi)\J^o] - E[Hh„{Xn+i,Yn+i)\J^n]) con- 
verges almost surely to zero. 

Lastly, the process {(-Dn,fc)-^fc) < n} is a martingale-difference array. Again by (8), the 
boundedness of K, the drift condition (Bl), E{\Dn,k\^) < E Yfc)^] < CE{V{Xk)). 



Then using (7), we obtain that sup„>o supo<fc<„ E iD^^fcl < oo. This implies, in the nota- 
tions of Theorem 1.2, that E [|5'„^mP] ^ Cm, for some finite constant C that does not de- 
pend on n nor m. Moreover, we can write Dnj — Dn-ij = Hfi^{Xj,Yj) — Hh„_j^{Xj,Yj) — 
E {Hh^ {Xj ,Yj)- Hh^_, {Xj ,Yj)\Tj-i), and we note that 



{Ht,^-Ht,^_,){x,y)=i:{y)[K 



f Xq — X 



K 



f Xq — X 



\ hn ) \ hn—l 

Xq — X 



i>i 



hr. 



i^(^_^))E[^(y)|x = x]. 



By the Lipschitz condition on K and (8), 



E[ip{Y)\X = x]{k 



Xq — X 



K 



Xq — X 



h 



n-1 



< c 



Therefore ~ Hh„-i) {x,y)\ < C — (jx\'ip{y) + V^/^{x)j from which we deduce us 



ing (8) and (7) that E I \Dnj - Ai-ij-f ) = 0{ 



in 



2{-l+/3) 



) uniformly in j which implies as in Propo- 



sition 1.1 that E^/2 



X/j=l ^ri,j Dji—l^ 



= 0(n-^/2+/3) which together with E [IS'n,™^] < 
Cm proves (2), since /? < 1/4. We can therefore conclude that {nhn)~^ J2k=i ^n,k 0, P-almost 
surely, which ends the proof. □ 
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